Spoken  Dialogue:  Extending  Embedded  Virtual 
Simulation  with  a  Very  Human  Dimension 


Benjamin  Bell 

CHI  Systems,  Inc. 

1035  Virginia  Drive,  Suite  300 
Fort  Washington,  PA  19034  USA 
Tel  +1  215.489.5249/ Fax +1  215.542.1412 

bbell  @  chisy  stems .  com 


Philip  Short 

Aerosystems  International  Ltd. 

Lupin  Way,  Alvington, 

Yeovil  Somerset  BA22  8UZ  UK 
T  +44  1935  443000  /  F  +44  1935  443111 

phil.  short  @  baesys  terns,  com 


ABSTRACT 

Embedded  virtual  simulation,  employed  in  many  NATO  training  communities,  is  a  key  to  addressing 
urgent  training  needs  in  areas  that  stretch  the  capabilities  offered  by  conventional  simulation  techniques. 
Of  particular  relevance  to  NATO  are  training  needs  such  communication  and  tactical  team  coordination. 
This  paper  will  summarize  some  needs  in  training  and  mission  planning  that  remain  unmet,  discuss  the 
reasons  why  and  propose  some  specific  approaches  that  extend  the  reach  of  simulation  in  directions  that 
directly  address  these  gaps.  We  focus  on  communication  and  tactical  team  training.  We  will  show  specific 
examples  of  our  approaches  that  are  solving  tangible  training  and  rehearsal  problems  among  NATO 
constituencies  and  discuss  how  this  approach  can  be  broadly  applied  across  a  spectrum  of  training  settings. 

1.0  INTRODUCTION/RELEVANCE  TO  THE  WORKSHOP 

Replicating  unfamiliar  areas  of  operation  in  a  synthetic  environment  can  expose  NATO  forces  to  a 
spectrum  of  tactical  possibilities  they  may  encounter  in-theater.  The  use  of  simulation  in  training,  mission 
planning  and  rehearsal  is  a  long-accepted  practice,  but  when  forces  deploy  with  little  notice,  or  to  locales 
with  insufficient  technology  infrastructure,  the  benefits  of  virtual  simulation  remain  beyond  reach. 
Embedded  virtual  simulation  can  help  mitigate  these  challenges  faced  by  NATO  forces  when  rapidly 
deploying  to  new  locations,  by  providing  training,  mission  planning  and  rehearsal  capabilities  along  with 
the  digital  systems  employed  operationally. 

Some  skills  have  been  overlooked  in  embedded  simulation,  namely,  communication  and  team 
coordination.  Such  skills  are  gaining  increasing  importance,  as  forces  are  more  multinational  and  as 
missions  are  increasingly  conducted  against  an  asymmetric  adversary  and  complicated  by  proximity  to  and 
political  reliance  on  non-combatants.  This  training  gap  is  due  largely  to  the  complex  and  highly  verbal 
interactions  needed  to  incorporate  spoken  dialogue  into  synthetic  environments.  Nonetheless,  this  very 
human  dimension  remains  a  critical  part  of  realistic  training,  planning  and  rehearsal.  In  this  paper  we 
report  on  work  to  embed  speech-interactive  synthetic  agents  into  purpose-built  training,  mission  planning 
and  rehearsal  systems. 

2.0  RATIONALE 

The  most  widely-practiced  method  for  training  effective  coordination  and  communication  is  in  the  course 
of  live  or  simulated  exercises,  where  teams  engaged  in  a  tactical  scenario  learn  to  work  together  in  pursuit 
of  the  mission.  This  technique  has  the  advantage  of  realism,  since  the  team  members  are  interacting  with  a 
population  very  similar  to  what  they  will  encounter  in  the  field.  Despite  the  belief  that  such  techniques 
deliver  effective  training,  there  are  cost  and  access  penalties  incurred  by  live  or  virtual  team  exercises. 


RTO-MP-HFM-1 69 


11  - 1 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

OCT  2009 


2.  REPORT  TYPE 

N/A 


3.  DATES  COVERED 


4.  TITLE  AND  SUBTITLE 

Spoken  Dialogue:  Extending  Embedded  Virtual  Simulation  with  a  Very 
Human  Dimension 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

CHI  Systems,  Inc.  1035  Virginia  Drive,  Suite  300  Fort  Washington,  PA 
19034  USA 


5a.  CONTRACT  NUMBER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S )  AND  ADDRESS(ES )  10.  SPONSOR/MONITOR' S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADA562526.  RTO-MP-HFM-169  Human  Dimensions  in  Embedded  Virtual  Simulation  (Les 
dimensions  humaines  dans  la  simulation  virtuelle  integree).,  The  original  document  contains  color  images. 

14.  ABSTRACT 

Embedded  virtual  simulation,  employed  in  many  NATO  training  communities,  is  a  key  to  addressing 
urgent  training  needs  in  areas  that  stretch  the  capabilities  offered  by  conventional  simulation  techniques. 
Of  particular  relevance  to  NATO  are  training  needs  such  communication  and  tactical  team  coordination. 
This  paper  will  summarize  some  needs  in  training  and  mission  planning  that  remain  unmet,  discuss  the 
reasons  why  and  propose  some  specific  approaches  that  extend  the  reach  of  simulation  in  directions  that 
directly  address  these  gaps.  We  focus  on  communication  and  tactical  team  training.  We  will  show  specific 
examples  of  our  approaches  that  are  solving  tangible  training  and  rehearsal  problems  among  NATO 
constituencies  and  discuss  how  this  approach  can  be  broadly  applied  across  a  spectrum  of  training  settings. 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 


a.  REPORT 

unclassified 


b.  ABSTRACT 

unclassified 


c.  THIS  PAGE 

unclassified 


17.  LIMITATION  OF 

18.  NUMBER 

ABSTRACT 

OF  PAGES 

SAR 

10 

19a.  NAME  OF 
RESPONSIBLE  PERSON 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Spoken  Dialogue:  Extending  Embedded  Virtual 


riff  /-;  A.  -<Tf  yt  At1  ilrt 


1.  Many  in  the  exercise  may  not  be  getting  effective  training.  Such  “role-players”  are  needed  to 
perform  the  tasks  necessary  for  the  simulation  to  be  credible  and  effective;  in  other  words,  to 
provide  behavioral,  aural,  or  visual  cueing  to  the  user  to  simulate  how  the  team  would  be 
functioning; 

2.  It  introduces  variability  that  makes  standardization  of  training  difficult  due  to  the  human  element 
influencing  events  in  each  scenario; 

3.  It  interferes  with  performance  assessment,  since  it  is  often  the  instructors  themselves  who  are 
called  upon  to  divide  their  attention  between  evaluating  trainees  and  playing  roles; 

4.  Costs  arise  from  compensating,  transporting  and  maintaining  role -players  at  a  training  facility; 

5.  Availability  is  compromised  because  expert  role -players  can  be  exceedingly  difficult  to  arrange. 

The  consequence  is  that  access  to  team  training  is  measured  and  scheduled  and  conducted  only  at 
dedicated  facilities.  Since  such  training  is  offered  principally  at  home  stations,  deployed  forces  can  suffer 
steep  drop-offs  in  readiness,  as  any  skills  that  are  not  practiced  while  deployed  experience  sharp  decay 
(Chatham  &  Braddock,  2001).  These  challenges  affect  military  training  across  a  broad  spectrum  of  skills 
but  are  especially  salient  in  communication  and  coordination,  which,  broadly  speaking,  are  under¬ 
represented  in  training  as  well  as  in  the  technology  to  support  such  training. 

The  application  of  simulation  technology  to  training,  mission  planning  and  rehearsal  has  enabled  realistic 
overhead  2-D  and  immersive  3-D  “fly-through”  capabilities  that  can  help  improve  training  and  better 
prepare  tactical  teams  for  conducting  missions  in  unfamiliar  locales.  Detailed  terrain  data  can  offer  a 
preview  of  the  relevant  landmarks  and  hazards,  and  threat  models  can  provide  a  more  comprehensive 
glimpse  of  potential  hot  zones  and  safety  corridors.  A  further  extension  of  the  utility  of  such  techniques 
would  allow  users  to  perform  the  radio  communications  and  team  coordination  planned  for  a  mission;  that 
is,  the  coordination  that  is  critical  to  the  success  of  NATO  combat  missions  such  as  close  air  support 
(CAS).  Such  practice  opportunities,  while  valuable,  are  limited  by  the  inescapable  scarcity  of  complete 
mission  teams  to  gather  in  space  and  time  during  training,  planning  and  rehearsal  cycles.  Below  we 
discuss  this  gap  as  observed  in  two  contexts:  pilot  communications  training,  and  CAS  planning  and 
rehearsal. 

2.1  Communications  Training 

Effective  communication  with  team  members  is  an  essential  element  in  accomplishing  the  mission. 
Opportunities  to  practice  communication  skills  are  limited  to  live  or  simulated  team  exercises,  with  the 
members  of  the  team  either  present,  or  replaced  by  role -player  surrogates.  This  approach  is  subject  to  the 
cost  and  access  limitations  described  previously.  As  a  result,  training  programs  often  suffer  penalties  due 
to  ineffective  use  of  team  or  simulator  time,  and/or  poor  student  performance  leading  eventually  to 
washout,  due  to  the  paucity  of  practice  opportunities  in  team  communication. 

One  example  of  a  communication  training  need  comes  from  USAF  specialized  undergraduate  pilot 
training  (SUPT),  an  intensive  program  that  trains  new  pilots  prior  to  assignment  to  an  advanced  training 
track.  This  initial  phase  presents  student  pilots  with  an  array  of  complex  skills  to  acquire  and  integrate  in  a 
high-pressure,  time-sensitive  programme.  Current  approaches  that  augment  the  minimal  flying  hours  with 
simulation  devices  have  not  succeeded  in  providing  the  interactivity  required  for  some  skills  (particularly 
those  requiring  communication).  As  a  result,  training  gaps  have  emerged  in  the  SUPT  syllabus  that 
include  pattern  operations  and  radio  communications  (AFRL,  2002).  The  consequence  is  that  training 
benefit  from  time  in  the  airplane  is  compromised  whenever  an  instructor  is  obliged  to  review  skills  and 
concepts  (like  communication)  that  might  have  been  mastered  if  appropriate  simulation  technology  were 
available. 


11  -2 


RTO-MP-HFM-1 69 


Spoken  Dialogue:  Extending  Embedded  Virtual 


2.2  Tactical  Team  Mission  Planning 

Tactical  team  mission  planning  and  rehearsal,  though  a  ubiquitous  practice,  is  typically  conducted  with 
live  or  virtual  teams,  with  few  opportunities  to  practice  mission  coordination  skills  outside  of  scheduled 
runs.  The  need  to  pre-arrange  members  of  the  team  or  role -player  surrogates  introduces  cost  and  access 
limitations  described  previously.  An  instance  of  this  gap  is  in  mission  planning  and  rehearsal  for 
coordination-intensive  missions  such  as  CAS.  With  friendly  forces  and  non-combatants  in  close  proximity 
to  targets,  mission  success  requires  effective  verbal  interactions  between  the  air  assets  and  the  observers 
on  the  ground.  In  previous  work  we  have  demonstrated  the  use  of  speech -capable  synthetic  teammates  for 
CAS  training  (Bell,  Johnston,  Freeman  &  Rody,  2004).  Mission  planning  and  rehearsal  require  similar 
capabilities,  and  should  allow  users  to  practice  the  radio  communication  along  with  the  other  aspects  of 
mission  performance.  In  CAS,  for  instance,  the  air-ground  coordination  is  critical  to  the  success  and  safety 
of  the  mission  and  should  be  represented  in  walk-through/fly-through  activities.  Unfortunately  this  is 
seldom  the  practice,  due  largely  to  the  separation  in  time  and  space  of  the  respective  staffs  in  the  air  and 
ground  elements  planning  and  rehearsing  the  mission. 


3.0  DESCRIPTION  OF  METHODS  EMPLOYED  AND  RESULTS  OBTAINED 

In  order  to  meet  the  challenges  summarized  above,  traditional  simulation  must  be  augmented  with  robust, 
verbally-interactive  synthetic  agents.  Such  agents  must  possess  capabilities  that  extend  well  beyond 
conventional  computer-generated  forces  (CGFs),  semi-automated  forces  (SAFs),  and  game-based  artificial 
intelligence,  or  “AI”s  -  largely  scripted  entities  with  limited  abilities  to  respond  to  events  beyond  a 
predefined  range  of  simple  behaviors.  Entities  driven  by  CGFs,  SAFs,  or  AIs  cannot  model  the  real-world 
complexities  necessary  to  provide  training  value  at  the  level  of  individual  interaction.  To  provide 
interaction  effectively  for  team  training,  synthetic  teammates  require  the  following  capabilities: 

1.  simultaneous  execution  of:  taskwork  (e.g.,  flying  the  aircraft,  working  the  console);  teamwork 
(interacting  with  other  members  of  the  team);  and  instruction  (providing  assessment  and  feedback 
); 

2.  interaction  via  spoken  language  (required  for  team  training  in  verbal  environments);  and 

3.  modulating  behaviours  to  replicate  various  error  modes,  to  allow  for  varying  the  proficiency  of  the 
synthetic  team  members  (important  in  team  training). 

These  generic  requirements  extend  well  beyond  conventional  computer-generated  forces  (CGFs),  semi- 
automated  forces  (SAFs),  and  game-based  artificial  intelligence,  or  “AI”s  -  largely  scripted  entities  with 
limited  abilities  to  respond  to  events  outside  a  predefined  range  of  simple  behaviors.  CGF/SAF 
technologies  do  have  an  important  role  to  play,  but  for  our  purposes  they  fall  short  of  addressing  specific 
needs  that  remain  unmet.  To  meet  these  needs,  we  are  employing  cognitive  modeling  using  CHI  Systems’ 
computational  development  tool,  iGEN®,  for  encapsulating  human  expertise  and  behavior  in  synthetic 
agents  (Zachary,  LeMentec  &  Ryder,  1996).  Sophisticated  agents,  such  as  those  which  may  be  built  using 
iGEN,  can  provide  dialogue-capable  synthetic  teammates  to  reduce  reliance  on  human  role -players  and 
make  training,  mission  planning  and  rehearsal  more  accessible,  less  costly,  and  more  standardized.  Below 
we  summarize  two  recent  implementations  of  this  technique,  addressing  the  needs  presented  above: 
communications  training,  and  tactical  team  mission  planning  and  rehearsal,  respectively. 

3.1  Communications  Training 

USAF  Joint  Primary  Pilot  Training  (JPPT)  teaches  flying  principles  and  techniques  to  student  pilots  at 
dedicated  training  bases,  where,  due  to  the  number  of  aircraft  operating  in  proximity  to  the  field,  there  is  a 
standard  traffic  pattern  and  established  procedures  for  requesting  the  overhead  pattern  to  maximize 


RTO-MP-HFM-1 69 


11  -3 


Spoken  Dialogue:  Extending  Embedded  Virtual 


e-mri.  A-tort vi  A+trtlit 


opportunities  to  practice  landings.  Pilot-controller  radio  communications  in  the  traffic  pattern  follow  a 
specific  protocol  to  minimize  radio  congestion  and  enhance  comprehension.  It  is  important  for  the 
students  to  learn  and  use  standard  phraseology  for  these  purposes.  Furthermore,  the  communications 
between  other  pilots  and  the  controllers  provide  an  important  source  of  situational  awareness  as  they 
include  position  reports  and  clearance  requests.  Thus,  part  of  learning  radio  communications  is  learning  to 
develop  situational  awareness  from  listening  to  radio  calls  of  other  pilots  in  the  pattern. 

However,  these  very  skills  were  identified  as  training  gaps  in  an  AFRL  study  (AFRL,  2002).  Not 
surprisingly,  the  high-fidelity  training  devices  employed  in  primary  pilot  training  make  no  accommodation 
for  communications  training,  other  than  a  helpful  simulator  instructor  issuing  occasional  commands  to 
simulate  a  controller,  nor  is  there  simulated  radio  traffic.  To  address  this  gap,  AFRL  and  CHI  Systems 
developed  the  Virtual  Interactive  Pattern  Environment  and  Radiocomms  Simulator  (VIPERS). 

VIPERS  offers  users  opportunities  for  guided  practice  and  feedback  in  radio  communications  skills  and 
decision  making  in  a  simulated  pattern  environment  (Bell,  Ryder  &  Pratt,  2008).  The  format  of  this 
practice  is  simulation-based  training  with  intelligent  software  agents  performing  in  both  tutoring  roles  and 
synthetic  teammate  roles,  in  a  laptop-based  portable  application  for  anytime/anywhere  training 
enrichment.  The  core  training  technique  in  VIPERS  is  scenario-based  guided  practice  (Fowlkes,  Dwyer, 
Oser  &  Salas,  1998;  Schank,  Fano,  Bell  &  Jona,  1994)  in  a  simulated  traffic  pattern.  Specifically,  VIPERS 
provides  three  types  of  speech-interactive  synthetic  entities: 

1 .  a  synthetic  instructor  that  provides  coaching  and  feedback  during  scenarios  and  makes 
assessments  to  be  used  in  a  debrief; 

2.  a  synthetic  controller  that  maintains  knowledge  of  all  aircraft  in  the  pattern  and  verbally  responds 
to  clearance  requests  and  issues  directives  to  all  aircraft  in  the  pattern;  and 

3.  synthetic  pilots/aircraft  in  the  pattern  behaving  appropriately  and  making  radio  calls. 

Figure  1  illustrates  the  display  during  a  VIPERS  scenario.  The  mission  display  is  a  top-down  schematic 
view  of  the  pattern  with  aircraft  icons  representing  the  pattern  traffic.  In  the  mission,  the  user  commands 
the  aircraft  and  makes  radio  calls  as  if  flying  the  airplane.  The  user  controls  the  aircraft  using  high-level 
controls  indicated  by  buttons  that  the  user  can  select  either  via  mouse  or  keyboard.  In  addition,  the  user 
has  a  headset  with  microphone  for  transmitting  and  receiving  radio  communications. 


11  -4 


RTO-MP-HFM-1 69 


Spoken  Dialogue:  Extending  Embedded  Virtual 


Figure  1 :  Example  screen  from  the  VIPERS  communications  training  program. 

The  simulation  includes  synthetic  aircraft  flying  in  the  pattern  (represented  by  aircraft  icons  on  the 
display)  with  synthetic  pilots  making  the  appropriate  radio  communications.  It  also  includes  a  synthetic 
controller  responding  to  clearance  requests  and  issuing  directives  to  all  aircraft  in  the  pattern.  The 
synthetic  instructor  provides  coaching  and  short  feedback  as  appropriate,  reminding  the  user  to  make 
missed  calls,  and  assuming  temporary  control  of  the  aircraft  if  needed.  At  the  conclusion  of  the  mission,  a 
debrief  is  provided  to  the  user,  reviewing  the  user’s  performance  on  the  following  four  performance 
measures:  (1)  making  correct  radio  transmissions;  (2)  proper  performance  of  in-flight  checks;  (3)  taking 
appropriate  actions  in  decision  situations;  and  (4)  complying  with  directives.  A  representative  transcript  is 
shown  in  Figure  2. 

VIPERS  provides  instructor-optional  guided  practice  and  feedback  in  radio  communications  skills  and 
decision  making  in  the  JPATS  pattern.  The  combination  of  PC -based  simulation,  intelligent  speech- 
interactive  synthetic  teammates,  and  speech  interaction  increases  training  availability  and  reduces 
dependence  on  instructors.  Data  collected  from  70  users  over  a  five-month  period  show  statistically- 
significant  training  gains  from  using  VIPERS.  Specifically,  VIPERS  use  correlated  (significantly)  with 
reduced  time  to  achieve  a  rating  of  “good”  on  flown  sorties  for  all  three  measures  (situational  awareness, 
communications,  and  in-flight  checks)  identified  by  the  Air  Force  as  being  relevant.  This  program  is  thus  a 
convincing  illustration  of  how  speech-interactive  synthetic  teammates  can  offer  solutions  for  training 
tactical  communications  skills. 


RTO-MP-HFM-1 69 


11  -5 


Spoken  Dialogue:  Extending  Embedded  Virtual 


User: 

Texan  one-five  request  closed 

RSU: 

Closed  Approved 

IP: 

You  need  to  call  closed  downwind 

User: 

Texan  one-five  closed  downwind 

SSP,: 

Tiger  two-three  initial  request  high  key 

RSU: 

Report  high  key 

User: 

Below  one-fifty,  gear  clear 

IP: 

Clear 

User: 

Texan  one-five  gear  down 

User: 

Handle  down,  three  green,  flaps  take-off 

IP: 

Confirm 

SSP2: 

Lush  four-two,  two  miles,  gear  down 

User: 

Gear  up,  lights  out,  flaps  up  by  one-fifty 

User: 

Texan  one-five  breakpoint  straight  through 

IP: 

Disregard 

User: 

Texan  one-five  request  closed 

RSU: 

Negative  closed 

IP: 

1  have  the  aircraft 

Figure  2:  Representative  dialogue  among  Texan  one-five  (user),  and  synthetic  agents:  RSU 
(controller),  student  pilots  (SSP)  and  instructor  pilot  (IP) 


3.2  Tactical  Team  Mission  Planning  and  Rehearsal 

In  related  work  we  are  applying  some  of  the  capabilities  we  developed  in  the  training  domain  to  explore 
more  realistic  and  more  accessible  mission  planning  and  rehearsal  tools.  Our  focus  was  on  users  in  high 
OPTEMPO  contexts,  engaged  in  missions  requiring  a  great  deal  of  teamwork.  We  looked  particularly  at 
cases  where  teams  are  distributed  and  where  verbal  communication  enjoys  a  key  role  in  mission 
coordination,  selecting  CAS  for  this  study.  To  accelerate  our  research,  we  employed  a  fielded  mission 
planning  and  rehearsal  tool,  so  that  we  could  devote  our  attention  to  investigating  the  utility  of  speech- 
interactive  synthetic  teammates  rather  than  on  creating  a  suitable  testbed.  The  tool  we  employed  is  called 
the  Combined  Arms  Gateway  Environment  (CAGE).  Developed  by  Ael,  CAGE  is  a  mission  support  tool 
that  enables  operators  to  plan,  rehearse  and  then  conduct  missions  under  a  wide  variety  of  operational 
conditions.  CAGE  allows  planners  to  employ  the  rehearsal  capability  to  create  routes,  inspect  and 
deconflict  airspace,  view  corridors  and  define  threat  cones.  Planners  and  mission  personnel  can  view  the 
mission  in  2-D  (top-down)  and  3-D.  The  3-D  view  provides  dynamic  lighting  (sun,  shade,  moonlight)  to 
assess  the  tactical  implications  of  time  of  day  and  visibility  effects  (fog,  haze,  cloudbase)  to  project  the 
visibility  under  the  forecast  weather  conditions. 

A  high-level  needs  analysis  was  performed  for  a  CAS  scenario.  This  was  limited,  in  alignment  with  the 
exploratory  nature  of  this  research,  and  so  focused  specifically  on  voice  interaction.  This  entailed 
performing  a  Hierarchical  Task  Analysis  (HTA)  for  the  scenario,  and  reviewing  each  relevant  step1  to 
identify: 

The  objective  for  that  step. 

How  to  gauge  that  the  objective  has  been  achieved,  i.e.  the  measure  of  effectiveness  (MoE); 

The  required  inputs  for  that  step  (what  the  instructor  has  to  include  over  and  above  the  synthetic  agent 
component  in  order  to  accomplish  the  step); 

The  specific  benefits  that  the  synthetic  agent  provides,  which  would  not  have  been  achieved  by  other 
means  ( e.g .  by  displaying  the  dialogue  as  text  on  a  screen); 

What  the  technology  must  be  able  to  do  in  order  to  provide  the  required  benefit. 


1  By  'relevant  step'  we  mean  those  steps  that  involve  the  user  doing  something,  as  the  HTA  also  covers  the  actions  of 
the  Joint  Terminal  Attack  Controller  (i.e.  the  actor  being  'played'  by  the  synthetic  agent). 


11  -6 


RTO-MP-HFM-1 69 


Spoken  Dialogue:  Extending  Embedded  Virtual 


The  results  of  the  HTA  were  captured  against  the  following  criteria  (example  outcomes  shown  in 
brackets): 

•  Task:  [Look  for  described  area  and  features], 

•  Objective:  [Rapidly  and  accurately  identify  areas  based  on  description  of  the  visual  scene], 

•  MoE:  [Identify  target  within  elapsed  time  parameters], 

•  Required  inputs:  [A  representation  of  the  visual  scene  that  relates  to  the  descriptions  being 
provided] . 

•  Benefit:  [Synthetic  agent  allows  natural  interaction  between  user  and  JTAC,  with  correct  sensory 
input  (auditory)  and  output  (speech)] . 

•  Requirement  for  agent:  [able  to  provide  descriptions  that  relate  to  the  visual  scene  provided]. 


To  bound  our  initial  experiment,  we  created  a  set  of  CAS  scenarios,  focusing  on  dialogue  between  the 
pilot  and  a  Joint  Terminal  Attack  Controller  (JTAC),  allowing  for  alternative  dialogue  branches  and  error 
correction.  The  complexity  of  the  scenarios  determines  the  necessary  sophistication  of  the  grammar, 
synthesized  voice,  and  agent  model.  For  this  exploratory  effort,  the  scenarios  were  limited  to  specific 
phases  of  a  representative  CAS  mission.  We  created  an  iGEN  model  to  play  the  role  of  the  JTAC. 


The  implemented  scenario  demonstrates  a  mission  rehearsal  where  the  user  takes  on  the  role  of  lead  CAS 
pilot,  interacting  with  a  synthetic  JTAC  agent.  When  a  scenario  is  started,  the  components  load  their 
required  data  (CAGE  loads  its  scenario  data,  the  speech  components  load  the  grammar  and  voice  data,  and 
iGEN  loads  the  JTAC  model)  and  each  initializes  the  appropriate  communication  channels.  The  user 
selects  a  call  sign  from  a  set  of  nominal  identifiers  and  two-digit  suffixes.  The  user  then  begins  the 
mission  and  initiates  communication  by  checking  in  with  the  chosen  call-sign.  Figure  3  shows  a 
representative  display  at  this  point 

in  the  mission,  with  a  3-D  view  on  the  left  and  the  2-D  view  on  the  right. 


Figure  3:  Representative  rehearsal  display  in  CAGE 


The  JTAC  agent  transmits  a  9-line  brief,  based  on  information  given  to  it  by  CAGE  (the  user  can  request  a 
re -transmit  at  any  point  during  the  mission).  The  user  then  repeats  the  9-line  and  this  read-back  is  checked 
by  the  synthetic  JTAC  for  accuracy.  If  an  error  is  found  in  the  readback,  the  user  is  asked  to  repeat  any 
incorrect  portions  of  the  communication  until  it  is  correct  (and  only  the  incorrect  portions).  Following 
accomplishment  of  the  9-line,  the  JTAC  agent  directs  the  user  to  the  target,  who  must  read  back  the 


RTO-MP-HFM-1 69 


11  -7 


Spoken  Dialogue:  Extending  Embedded  Virtual 


targeting  information,  which  is  again  checked  for  accuracy.  Following  an  accurate  read-back,  the  JTAC 
clears  the  user  for  attack.  After  attack  the  JTAC  responds  with  a  battle  damage  assessment,  and  the  user 
signs  off.  During  each  exchange  in  the  dialogue  the  JTAC  waits  for  the  appropriate  response  from  the 
user,  and  asks  the  user  to  repeat  any  communication  that  is  incorrect  or  unrecognizable.  A  representative 
transcript  is  shown  in  Figure  4. 

User:  Widow  76  this  is  Vader  28  checking  in  as  fragged 

JTAC:  Vader  28,  Widow  76  Loud  and  clear,  this  is  a  Type  1  control,  call  ready  to  copy. 

User:  Vader  28  Type  1  control,  ready  copy 

JTAC:  IP  U278,  Heading  055  magnetic,  Distance  9260  meters,  Elevation  70  feet.  Target  is  a  Helicopter 
parked  on  western  edge  of  dispersal.  Location  North  51  00.89  West  002  38.01.  Mark  Laser  1111 
LTL  355  Magnetic.  Friendlies  1000  South,  Egress  North  to  Bad  Wolf.  Advise  when  ready  for 
remarks 

User:  Ready  to  copy  remarks 

JTAC:  Final  attack  heading  055  through  030 

User:  Elevation  70  feet ,  Location  North  51  00.89  West  002  38.01 .  Friendlies  1  km  South.  Laser  1111 
LTL  355  magnetic.  Attack  heading  055  through  030  magnetic 
JTAC:  Readback  correct,  report  leaving  IP 
User:  Leaving  IP,  abort  alfa  romeo  sierra 

JTAC:  Widow  76,  abort  alfa,  romeo,  sierra  your  target  is  one  of  2  helicopters  on  the  western  edge  of  a 
dispersal. 

User:  Helicopter,  western  edge,  dispersal.  Vader  28  leaving  IP. 

JTAC:  Short  of  target,  airfield 
User:  Short  of  target,  airfield 

JTAC:  North  of  runways,  group  of  8  hangars.  From  there,  12  o’clock  500,  further  set  of  3  hangars,  North 
East  corner  airfield.  Laser  on.  Friendlies  to  South  of  all  runways. 

User:  Contact  1 0  seconds.  Further  3  hangars  Laser  on.  Visual  friendlies 

JTAC:  Right  of  hangars  is  large  dispersal,  in  sunlight,  target  is  helicopter  on  right  hand  side 

User:  Contact  Target,  left  of  target  further  helicopter  against  building. 

JTAC:  Affirm,  cleared  hot 

User:  In  hot.  Rifle  away.  Terminate 

JTAC:  Terminate,  Vader  28,  widow  76,  Delta  Hotel,  helicopter  destroyed,  End  of  mission. 

User:  Target  destroyed,  Delta  Hotel,  End  of  Mission. 

Figure  4:  Representative  dialogue  between  aircraft  (user)  and  JTAC  agent 


An  important  design  consideration  is  the  degree  of  variability  in  whether  user  utterances  are  treated  as 
“legal”.  Too  restrictive  an  approach  erroneously  emphasizes  syntax  over  semantics,  frustrates  users,  and 
undermines  mission  planning  and  rehearsal  objectives.  Too  accommodating  an  approach  not  only  adds 
complexity  to  the  recognition  process  but  could  introduce  non-doctrinal  phraseology.  There  is  no  quick-fix 
solution;  striking  a  proper  balance  depends  on  thoughtful,  comprehensive  consultations  with  subject 
matter  experts,  guided  by  a  principled  cognitive  task  analysis  methodology  ( e.g .,  Zachary,  Ryder  & 
Hicinbothom,  2000).  For  our  exploratory  study  we  employed  a  CAS -rated  RAF  pilot  and  implemented 
logic  in  the  JTAC  agent  that  permits  lexical  and  syntactic  variations  based  on  the  tactical  context.  Each 
communication  spoken  by  the  user  can  thus  be  phrased  in  different  ways;  optional  wording  can  be  omitted 
and  some  alternate  wordings  are  accepted. 

This  flexible  grammar,  combined  with  the  selective  requests  for  read -back  (i.e.,  only  incorrect  portions  of 
the  9-line  need  be  repeated)  afford  the  user  a  transparent  dialogue  capability.  For  the  initial  work  reported 
here,  we  developed  a  speaker-independent  demonstration  that  required  no  training  to  a  specific  voice.  Our 
testing  team  consisted  of  both  U.K.  and  U.S.  speakers  and  there  were  no  noticeable  differences  in 
recognition  rates  among  them. 


11  -8 


RTO-MP-HFM-1 69 


Spoken  Dialogue:  Extending  Embedded  Virtual 


Initial  results  showed  that  there  was  an  immediate  benefit  to  being  able  to  practice  techniques  as  they 
would  be  performed  for  real  while  remaining  in  a  benign  environment.  For  early-stage  training,  this 
removes  the  stress  of  the  real  situation  in  order  to  put  the  trainee  at  ease;  for  planning  and  rehearsal  the 
realism  is  sufficient  to  provide  the  necessary  situational  awareness  to  adequately  exercise  the  plan  and 
measure  an  individual’s  performance  in  executing  it. 

Early  feedback  from  end-users  also  indicates  the  scaleability  of  this  technology.  There  is  significant 
potential  to  increase  the  richness  of  the  training  experience,  including  using  the  synthetic  agent  to  increase 
the  user’s  exposure  to  operational  stress;  to  augment  the  simulated  environment  with  more  diverse  players 
and  to  provide  voice  interaction  in  situations  where  it  is  not  currently  available. 


4.0  CONCLUSIONS 

The  investigations  reported  here  provide  support  for  the  utility  of  speech -interactive  synthetic  teammates 
for  training,  mission  planning  and  rehearsal.  We  are  currently  planning  to  develop  more  comprehensive 
and  more  complex  scenarios  in  these  domains,  which  will  require  behavioral,  speech  and  grammar 
components  with  additional  sophistication.  This  will  require  more  robust  speech  recognition  and  discourse 
management.  We  will  address  this  need  by  employing  a  dynamic  grammar,  where  an  intelligent  agent 
activates  and  de-activates  sub-grammars  as  the  tactical  situation  changes,  an  approach  we  have  reported  in 
previous  work  (Bell,  Johnston,  Freeman  &  Rody,  2004).  Our  work  has  indicated  that  there  is  significant 
training  benefit  to  be  gained  from  using  speech  interactive  agents  through  increased  richness  or  improved 
efficiency  of  the  training  environment  (Bell,  Ryder  &  Pratt,  2008). 

New  simulation  capabilities  that  extend  the  benefits  of  synthetic  training  can  yield  parallel  advances  in 
mission  rehearsal  and  mission  planning.  For  missions  that  rely  on  effective  communication  and 
coordination,  though,  the  verbal  exchange  among  tactical  teammates  is  trained,  planned  and  rehearsed 
only  if  and  when  suitable  role -players  are  available,  co-located  in  time  and  place.  By  employing  the 
knowledge  encapsulated  in  an  intelligent  agent,  we  can  overcome  many  of  the  challenges  faced  in  human- 
computer  dialogue,  and  continue  to  enrich  synthetic  training  while  migrating  the  benefits  of  this  approach 
into  the  realms  of  mission  planning  and  rehearsal.  The  research  summarized  in  this  paper  offers  evidence 
that  agents  of  sufficient  cognitive  fidelity  to  support  spoken  dialogue  can  extend  embedded  virtual 
simulation  to  achieve  a  new  level  of  readiness  for  NATO  forces  deploying  to  new  locales  with  little 
advance  preparation  time. 


REFERENCES 

[1]  AFRL  (2002).  Advanced  Training  Technology  Needs  for  AETC  Flying  Training,  (Final  Report, 
ETTAP  Project  02-19).  Mesa,  AZ:  Air  Force  Research  Laboratory. 

[2]  Bell,  B.,  Johnston,  J.,  Freeman,  J.,  &  Rody  F.  (2004).  STRATA:  DARWARS  for  Deployable,  On- 
Demand  Aircrew  Training.  In  Proceedings  of  the  Interservice/Industry  Training,  Simulation,  and 
Education  Conference  (PITSEC),  December,  2004. 

[3]  Bell,  B.,  Ryder,  J.,  and  Pratt,  S.  (2008).  Communications  and  coordination  training  with  speech- 
interactive  synthetic  teammates:  A  design  and  evaluation  case  study.  In  D.  Vincenzi,  J.  Wise,  P. 
Hancock  and  M.  Mouloua  (Eds.),  Human  Factors  in  Simulation  and  Training.  CRC  Press.  Boca 
Raton. 


RTO-MP-HFM-1 69 


11  -9 


Spoken  Dialogue:  Extending  Embedded  Virtual 


e-tar'. 


[4]  Chatham,  R.E.,  and  Braddock,  J.  (2001).  Report  of  the  Defense  Science  Board  Task  Force  on 
Training  Superiority  and  Training  Surprise  (Washington,  DC:  Office  of  the  Undersecretary  of 
Defense  for  Acquisition,  Technology,  and  Logistics,  2001),  5. 

[5]  Fowlkes,  J.  E.,  Dwyer,  D.,  Oser,  R.  L.,  &  Salas,  E.  (1998).  Event-Based  Approach  to  Training 
(EBAT).  The  International  Journal  of  Aviation  Psychology,  8  (3),  209-221. 

[6]  Schank,  R.C.,  Fano,  A.,  Bell,  B.L.,  and  M.K.  Jona  (1994).  “The  Design  of  Goal  Based  Scenarios.” 
The  Journal  of  the  Learning  Sciences,  3(4),  305-345,  1994. 

[7]  Zachary,  W.,  LeMentec,  J.C.,  &  Ryder,  J.  (1996)  Interface  agents  in  complex  systems.  In  C.  Ntuen 
&  E.  Park  (eds.),  Human  Interaction  with  Complex  Systems:  Conceptual  Principles  and  Design 
Practice.  Norwell,  MA:  Kluwer  Academic  Publishers,  pp.  35-52. 


11  - 10 


RTO-MP-HFM-1 69 


